Depleting Natural Resources Efficiently

Ernesto Carrella

October 20, 2016

Introduction

Problem

  • Modular fishing agent-based model
  • Where to fish?
  • Decisions are:
    • Geographical
    • Repeated
    • Myopic
    • Uninformed

Output

  • Many algorithms
  • Applicable to other ABMs
  • Fishing as a horse race

Bandit Algorithms

Bandit Algorithms

  • Multi-armed bandit problem:
    • Choose between \(K\) slot machines
    • Play game \(N\) times
    • Make most money
  • Bandit Algorithms:
    • Require no knowledge
    • Degrade linearly with \(K\)

\(\epsilon\)-Greedy

# store the empirical means in an array
means<-numeric(K)
# keep making choices
while(TRUE)
{
  # with probability epsilon, make random choice
  if random < epsilon
    x <- random from 1 to K
  else
  # otherwise choose option with highest empirical mean
    x <- which.max(means)
  # choose x and get a random reward
  reward<- play(x)
  # update means
  means[x]<- update means[x] with reward
}

UCB-1 Algorithm

# store the empirical means
means<-numeric(K)
# store times each option was played
ns<-array[K]
#keep track of how many games played
t<-0
# first play one game for each possible option
for( x in 1:i)
{
  reward<-play(x)
  ns[x]<-ns[x]+1
  means[x]<- update means[x] with reward
  t<-t+1
}
# now choose the arm with the maximum upper confidence bound
while(true)
{
  x<- which.max(means[x] + sqrt(2*log(t)/ns[x]))
  reward<- play(x)
  ns[x]<-ns[x]+1
  means[x]<- update means[x] with reward
  t<-t+1
}

Boltzmann Exploration

# store the empirical means in an array
means<-numeric(K)
# keep making choices
while(TRUE)
{
  # compute the sum of the exp function of the means
  denominator<- sum(exp(means))
  # probability of choosing arm i is proportional to its empirical average
  probabilities<- exp(means)/denominator
  # choose arm based on that probability
  x<-sample(1:K,prob = probabilities,size = 1)
  # play x and get a random reward
  reward<- play(x)
  # update means
  means[x]<- update means[x] with reward
}

\[ p_i = \frac{e^{\mu_i}}{\sum_{j=1}^K e^{\mu_j}} \]

When to use?

  • Each spot on the map a slot machine
  • Anecdotally they become quite useless with \(K>=50\)
  • Map size 50x50
  • Weak Exploration:
    • Can only explore one arm at a time
    • All arms are independent of one another
  • Improvements:
    • Parallelize exploration
    • Add structure

Parallel exploration

Parallel exploration

  • Exploring one arm at a time is very slow
  • ABM: multiple agents competing
  • Sharing information as parallelization
  • Population optimization:
    • Effective
    • Somewhat unrealistic
    • Weak to dynamic problems

Gravitational Search - Impression

conga

Gravitational Search - Demo

Particle Swarm

# get pulled by your most profitable friend
friend_velocity<- distance(friend,me)
# get pulled by your best memory
memory_velocity<- distance(best_memory,friend)
# update velocity and position
velocity<-old_velocity * inertia + alpha * friend_velocity + beta * memory_velocity + runif() 
position<-position + velocity
play(position)

Particle Swarm - Demo

Explore-Exploit-Imitate

# with probability epsilon, explore
if(runif()<epsilon)
  # shock your position by delta
  position <- position + runif(min=-delta,max=delta)
else
  # if a friend is doing better, go where he's going.
  if(profits[me] < profits[best_friend])
    position<- positions[best_friend]
  # otherwise just keep your current position
play(position)

Explore-Exploit-Imitate - Demo

Social Annealing

# if you are doing less well than the average then explore
if(profits[me]<mean(profits))
  # shock your position by delta
  position <- position + runif(min=-delta,max=delta)
play(position)

Social Annealing - Demo

Adding structure

Adding structure

  • Exploring one arm at a time is very slow
  • Assume some “spatial” correlation
    • Describe each observation in terms of features \(f\) \[ d(a,b) = \sum_i \frac{|f_i(a)-f_i(b)|}{h_i} \]
  • Recursive supervised learning
  • Heatmaps

Nearest Neighbors

Nearest Neighbors - How

  • Predict the “closest” known example \[ d(a,b) = \sum_i \frac{|f_i(a)-f_i(b)|}{h_i} \]
  • Implement through k-d tree
    • Search: \(O(\log N)\)
    • Insertion: \(O(\log N)\)

Nearest Neighbors - Demo

Kernel Regression

Kernel Regression - How

  • Predict a weighted average of known observations
  • Weight proportional to distance \[ y^* = \frac{\sum_i K(x_i,x^*)y_i}{\sum_i K(x_i,x^*)} \]
  • Degrades faster:
    • Search: \(O(NF)\)
    • Insertion: \(O(1)\)
    • Rollout memory: \(N=100\)

Kernel Regression - Demo

Kernel Transduction - How

  • Faster search, slower insertion
  • Exponential forgetting
  • Recursive Formulation: \[ \begin{aligned} g_n(x) &= \lambda g_{n-1}(x) + K(x,x_n) \\ m_n(x) &= m_{n-1} + \frac{ \left(Y_n - m_{n-1}(x) \right) K_{h}(x,x_n)}{g_n(x)} \end{aligned} \]
    • Search: \(O(1)\)
    • Insertion: \(O(KF)\)

Kernel Transduction - Demo

Kalman Filters

Kalman Filters - How

  • Hidden Markov Chain estimators
  • Useful if you have model knowledge
  • Flexible but expensive
    • Search: \(O(1)\)
    • Insertion: \(O(KF^3)\)

Kalman Filters - Demo

Geographically Weighted Regressions

Geographically Weighted Regressions - How

  • Each cell has its own LOWESS estimator
  • Weighs features automatically
  • Exponential Forgetting
  • Recursive but expensive
    • Search: \(O(1)\)
    • Insertion: \(O(KF^3)\)

Geographically Weighted Regressions - Demo

Easy Bayes

Easy Bayes - How

\[ \text{Pr}(\text{good}|x) = \frac{\text{Pr}(\text{x}|\text{good})\text{Pr}(\text{good}) }{ \text{Pr}(\text{x}|\text{good})\text{Pr}(\text{good}) + \text{Pr}(\text{x}|\text{bad})\text{Pr}(\text{bad}) } \]

  • Easy if prior and posterior are normal
  • Ad-Hoc but fast
    • Search: \(O(1)\)
    • Insertion: \(O(K)\)
  • To my great chagrin

Easy Bayes - Demo

Even more structure: Planning

  • Search vs Planning
  • Profit function \[ p * \text{ Catches} - q * \text{Gas Consumed} \]
  • Different Heatmaps
  • A little knowledge is a dangerous thing

Advantage (part 1)

Advantage (part 2)

Tuning

Exogeneous Tuning

  • The learning parameters are fixed from the outside
  • Optimize for:
    • Data fit
    • Maximum Profits (deprecated)
  • Assumes parameters are fixed

Social Tuning

  • The learning parameters are found as a group
  • Meta Explore-Exploit-Imitate:
    • Everyone starts with different parameters
    • Explore or imitate fishers doing better
  • Assumes such imitations are possible

Personal Tuning

  • The learning parameters are found individually
  • Gradient-desent :
    • Every new observation look for parameters that would have predicted it
  • Too slow
  • Swayed by outliers

Horse Races

Baseline Scenario

Fines Imposed

California Model